Modified LTSE-VAD Algorithm for Applications Requiring Reduced Silence Frame Misclassification

نویسندگان

  • Iker Luengo
  • Eva Navas
  • Igor Odriozola
  • Ibon Saratxaga
  • Inma Hernáez
  • Iñaki Sainz
  • Daniel Erro
چکیده

The LTSE-VAD is one of the best known algorithms for voice activity detection. In this paper we present a modified version of this algorithm, that makes the VAD decision not taking into account account the estimated background noise level, but the signal to noise ratio (SNR). This makes the algorithm robust not only to noise level changes, but also to signal level changes. We compare the modified algorithm with the original one, and with three other standard VAD systems. The results show that the modified version gets the lowest silence misclassification rate, while maintaining a reasonably low speech misclassification rate. As a result, this algorithm is more suitable for identification tasks, such as speaker or emotion recognition, where silence misclassification can be very harmful. A series of automatic emotion identification experiments are also carried out, proving that the modified version of the algorithm helps increasing the correct emotion classification rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A voice activity detector for the ITU-t 8kbit/s speech coding standard g.729

Voice Activity Detectors (VAD's) are widely used in speech technology applications where available transmission or storage capacity is limited (e.g. mobile, DCME, etc.) and must be utilised with maximum economy. Modern day digital speech coding algorithms can provide toll quality speech at bit-rates as low as 8kbit/s (e.g. ITU-T G.729) and the use of a VAD can achieve further economy in average...

متن کامل

Concealment of Information in Inactive Audio Frames of VoIP

-Steganography is the hiding of a secret message within an ordinary message and the extraction of secret message at its destination. In digital steganography, electronic communications may include steganographic coding inside of a transport layer, such as a document file, image file, program or protocol. This paper describes how to segregate the audio that are streaming in the Voice over Intern...

متن کامل

Improved End-of-Query Detection for Streaming Speech Recognition

In many streaming speech recognition applications such as voice search it is important to determine quickly and accurately when the user has finished speaking their query. A conventional approach to this task is to declare end-of-query whenever a fixed interval of silence is detected by a voice activity detector (VAD) trained to classify each frame as speech or silence. However silence detectio...

متن کامل

Simultaneous gender classification and voice activity detection using deep neural networks

This paper proposes a novel technique for simultaneously executing gender classification and voice activity detection (VAD) using Deep Neural Networks (DNNs). Speaker information such as gender is important in some speech recognition applications such as recommendation systems and trend analysis. Usually, gender classification is applied after speech segments are detected by VAD. In previous st...

متن کامل

Noise Estimation based on Entropy without using VAD for Speech Enhancement

A practical speech enhancement system consists of two major components, the estimation of noise power spectrum, and the estimation of speech.In single channel speech enhancement systems, most algorithms require an estimation of average noise spectrum since a secondary channel is not available. This requires a reliable speech/silence detector. Thus the speech/silence detection can be a determini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010